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Abstract 



We introduce a new framework for web page 
ranking — reinforcement ranking — that improves 
the stability and accuracy of Page Rank while elim- 
inating the need for computing the stationary dis- 
tribution of random walks. Instead of relying 
on teleportation to ensure a well defined Markov 
chain, we develop a reverse-time reinforcement 
learning framework that determines web page au- 
thority based on the solution of a reverse Bell- 
man equation. In particular, for a given reward 
function and surfing policy we recover a well de- 
fined authority score from a reverse-time perspec- 
tive: looking back from a web page, what is the 
total incoming discounted reward brought by the 
surfer from the page's predecessors? This re- 
sults in a novel form of reverse-time dynamic- 
programming/reinforcement-learning problem that 
achieves several advantages over Page Rank based 
methods: First, stochasticity, ergodicity, and irre- 
ducibility of the underlying Markov chain is no 
longer required for well-posedness. Second, the 
method is less sensitive to graph topology and more 
stable in the presence of dangling pages. Third, 
not only does the reverse Bellman iteration yield a 
more efficient power iteration, it allows for faster 
updating in the presence of graph changes. Fi- 
nally, our experiments demonstrate improvements 
in ranking quality. 



1 Introduction 

Page Rank is a dominant link analysis algorithm for web page 
ranking ll27ll24ll26ll . which has been applied to a wide range 
of problems in information retrieval and social network anal- 
ysis ll32l, Bl [35l l2ll . Under Page Rank, authoritativeness is de- 
fined by the stationary distribution of a Markov chain con- 
structed from the web link structure ll30l I20I |6l IH [TH |26l HU . 
On each page, a model surfer follows a random link, jumping 
to the linked page and continuing to follow a random link. 
Thus, pages are treated as arriving in a Markov chain — the 
next page visited depends only on the page where the surfer 
currently visits. The rank of a web page is then defined as 



the probability of visiting the page in a long run of this ran- 
dom walk. Unfortunately, this simple protocol does not al- 
low the surfer to proceed from a page that has no outgoing 
links — such pages are called dangling pages. In these cases, 
the Markov chain derived from the link structure of the Web 
is not necessarily irreducible or aperiodic, which are required 
to guarantee the existence of the stationary distribution. To 
circumvent these problems, a teleportation operator is intro- 
duced that allows the surfer to escape dangling pages by fol- 
lowing artificial links added to the Web graph. Teleportation 
has been widely adopted by literature, leading to the well ac- 
cepted statio nary distribution formulation of authority rank- 
ing, see e.g. Hi |H, USES [M Ell- 

However, if we consider real search behavior, teleportation 
is obviously artificial. It is unnatural to propagate the score of 
a page to other unlinked pages, thus teleportation contributes 
a blind regularization effect rather than any real information. 
In fact, teleportation contradicts the basic hypothesis of Page 
Rank: through teleportation, pages that are not linked by a 
page still receive reinforcement from the page. Teleportation 
was primarily introduced to guarantee the existence of the sta- 
tionary distribution. In this paper, we show that teleportation 
is in fact unnecessary for identifying authoritative pages on 
the Web. First, contrary to widely accepted belief, telepor- 
tation is not required to derive a convergent power iteration 
for global Page Rank style authority scores. Second, as has 
been widely adopted in the random surfer interpretation for 
Page Rank, teleportation or even random walk is also unnec- 
essary conceptually. We introduce a new approach to defin- 
ing web page authority that is based on a novel reinforcement 
learning model that avoids the use of teleportation while re- 
maining well defined. We prove that the authority function 
is well posed and satisfies a reverse Bellman equation. We 
also prove that the induced reverse Bellman iteration, which 
is more efficient than the Page Rank procedure, is guaranteed 
to converge for any positive discount factor. 

In addition to establishing theoretical soundness, we also 
show that the reinforcement based authority function is less 
sensitive to link changes. This allows us to achieve faster 
updates under graph changes, addressing the Page Rank up- 
dating problem [4] in an efficient new way. As early as 2000, 
it was observed that 23% of the web pages changed their in- 
dex daily |Q1 . Unfortunately, the Page Rank power iteration 
does not benefit significantly from initialization with the pre- 



Algorithm 1 Standard procedure for computing Page Rank: 
efficient power iteration method that exploits G's structure. 
Initialize xq 
repeat 

x k +i = c:H T x k 

w = \\%k\\i - \\Xk+l\\l 

X k+1 = X k+1 +LOV 

until desired accuracy is reached 



vious stationary distribution H25H . We prove that our authority 
function can take better advantage of initialization, and yield 
faster updates to graph changes. Furthermore, we demon- 
strate that reinforcement ranking can improve on the author- 
ity scores produced by Page Rank in a controlled case study. 

2 Page Rank 

We first briefly review the formulation of Page Rank |0 |H 
[l2l l26ll . Suppose there are N pages in the Web graph un- 
der consideration. Let L denote the adjacency matrix of the 
graph; i.e., L(i,j) = 1 if there is a link from page i to page j, 
otherwise L(i,j) — 0. Let H denote the row normalized ma- 
trix of L; let e be the vector of all ones; and let v denote the 
teleportation vector, which is a normalized probability vec- 
tor (assume column vectors). Finally, let S be a stochastic 
matrix such that S = H + au T , where the vector a indi- 
cates a, = 1 if page i is dangling and otherwise. Here u 
is a probability vector that is normally set to either e/N or v. 
Note that adding au T to the H matrix artificially "patches" 
the dangling pages that block the random surfer. 

The transition probability matrix used by Page Rank is 

G = cS+(l-c)ev T , 

for a convex combination parameter c € (0, 1). The matrix 
G is stochastic, irreducible and aperiodic, and thus its sta- 
tionary distribution exists and is unique according to classical 
Markov chain theory. In fact, the Page Rank (denoted by 7f ) is 
precisely the stationary distribution vector for G, which satis- 
fies 7f = G t tt . Page Rank can be interpreted as follows: with 
probability c the surfer follows a link, otherwise with prob- 
ability 1 — c the surfer teleports to a page according to the 
distribution v; the rank of a page is then given by its long run 
visit frequency. Teleportation is key, since it ensures the chain 
is irreducible and aperiodic, thus guaranteeing the existence 
of a stationary distribution for the surfing process. 

Unfortunately, the introduction of teleportation causes the 
matrix G to become completely dense. Power iteration is 
therefore impractical unless one exploits its special structure 
in G; namely that it is a sparse plus two rank one matrices. 
An efficient procedure for computing Page Rank is given in 
Algorifhm[T]|22, 23]. This algorithm evaluates an equivalent 
update to TTk+i = G T ^k, but it avoids using G by implicitly 
incorporating the scores of the dangling pages and teleporta- 
tion in computing u>. Note that the issue of accommodating 
dangling pages in Page Rank has been considered a challeng- 
ing research issue lfl3l l6ll. 



3 MDPs and the Value Function 

A Markov Decison Process (MDP) is defined by a 5-tuple 
(S,A,P A ,1Z ,7); where § denotes a state space; A is the 
action space; V A is a transition model with P a (s, s') being 
the probability of transitioning to state s' after taking action 
a at state s; 1Z A is a reward model with lZ a (s, s') being the 
reward of taking action a in state s and transitioning to state 
s'; and 7 e [0, 1) is a discount factor (HI HB HI . 

A policy 7r maps a state s and an action a into a probability 
7r(s, a) of choosing that action in the state. The value of a 
state s under a policy ir is the discounted long-term future 
rewards received following the policy 



V*(5)=K{E 
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s = s, a t >o 
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where r t is the reward received by the agent at time t, and E v 
is the expectation taken with respect to the distribution of the 
states under the policy. 

The value function satisfies an equality called Bellman 
equation. In particular, for any state s£§ 

V w (s) = 7E s ' eS P*{», s')V*{s') + f(a), (1) 

where P 7r (s, s') is the probability of transitioning from s to s' 
following the policy, and the f*(s) is the expected immediate 
reward of leaving state s following the policy. 

4 The Reinforcement Ranking Framework 

We now introduce the reinforcement ranking framework, 
which models search and ranking in terms of an MDP. The 
framework is composed of the following elements. 

The agent and the environment. The agent is a surfer model 
and the environment is a set of hyperlinked documents on 
which the surfer explores. That is, we consider the Web to be 
the environment; the surfer acts by sending requests that are 
processed by servers on the Web. This is a simple model of 
everyday surfing that stresses the subjectiveness of the surfer 
as well as the objective structure of the Web, in contrast to 
Page Rank which models surfing as a goal-less random walk. 

The rewards. According to 13311 . a reward function "maps 
each perceived state (or state-action pair) of the environment 
to a single number, a reward, indicating the intrinsic desir- 
ability of that state". Intuitively, a reward is a signal that 
evaluates an action. A surfer can click many hyperlinks on 
a page. If a clicking leads to a page that satisfies the surfer's 
needs, then a large reward is received; otherwise it incurs a 
small reward. From the perspective of information retrieval, 
the reward represents information gain from reading a page. 
The introduction of rewards to search and ranking is impor- 
tant because it highlights the fact that a page has intrinsic im- 
portance to users. In fact, this is a key difference from what 
has been pursued in the link analysis literature, which does 
not normally model pages as having intrinsic values. The re- 
ward hypothesis is also important because surfing and search 
is purposeful in this model — actions are taken to achieve re- 
wards. In this paper, we will be considering a special reward 
function, in which lZ a (s, s') — r(s'), where r is a function 
mapping from the state space to real numbers. This means 
the reward of transitioning to a state is uniquely determined 
by the state itself. 



The actions and the states. An action is the click of a hy- 
perlink on a page. A state is a web page. The current state 
is the current page being visited by the surfer. After a click- 
ing on a hyperlink, the surfer can observe the linked page or 
a failed connection. For simplicity, we assume that all links 
are good in this paper. That is, the next state is always the 
page that an action leads to. Therefore, the state space § is 
the set of the web pages. The action space on a page s, de- 
noted A(s), is the set of actions that lead to the linked pages 
from s. The overall action space is defined by the union of 
the actions available on each page, i.e., A = U se §A(s). 

The surfing policy and transition model. A surfer policy 
specifies how hyperlinks are followed at web pages. Based on 
the above definitions relating web search to an MDP model, 
we can equate a surfer with a standard MDP policy as speci- 
fied in Section |3] For web search, we also assume the transi- 
tions are deterministic; that is, clicking a hyperlink on a par- 
ticular page always leads to the same successor page, hence 
V a (s,s') = 1 for all a e A(s). This treatment simplifies 
the problem without losing generality — it is straightforward 
to extend our work to the other cases. 

4.1 The Authority Function 

Given these associations established between web surfing and 
an MDP, we can now develop a web page authority function 
in the framework of reinforcement ranking. In particular, we 
define the authority score of a page to be the rewards accu- 
mulated by its predecessors under the surfing policy. That is, 
for a page s G S, its authority score under surfing policy tt is 

OC 

= (2) 

where 7 is the discount factor, r(s) is a reward that is depen- 
dent on s, and (s) is the reward carried from the fc-step 
predecessors of s to s by the policy. Note that in the second 
equation, r^ ' = 7-, and if a page s has no predecessor, R w (s) 
can be set to r(s) or some other default value. The fc-step 
historical rewards to a state s are defined as follows: 
A 1 ) (s) captures the one-step rewards propagated into s 



1) ( s ) = E p p>^ : 



pes 



A 2 ) (s) captures the rewards from the two-step predecessors 



,.(2) 



pes p'es 



r^ 3 ) (s) captures the 3-step rewards propagated into s 



,.(3) 



00 = E P ls E P p',p E P P",P'r(p"); etc. 
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Note that the discount factor 7 plays an important role in 
this model, since it controls the effective horizon over which 
reward is accumulated. If 7 is large, the authority score will 
consider long chains of predecessors that lead into a page. If 
7 is small, the authority score will only consider predecessors 



that are a within a few steps of the page. This gives a new in- 
terpretation for the dampening-factor-like in PageRank. Pre- 
viously it is commonly recognized that the larger the damp- 
ening factor in PageRank, the closer the score vector reflects 
the true link structure of the graph, e.g. see ll9l Uol l3l ll4l ITTTl . 
While the two interpretations do not contradict each other, 
viewing the dampening/discount factor as a control over the 
distance of looking back from pages is surely both essential 
and intuitive. 

4.2 The Reverse Bellman Equation 

Although the authority score function R v appears to be sim- 
ilar to a standard value function V r , they are not isomorphic 
concepts: the value function (Q~|i is defined in terms of the for- 
ward accumulated rewards. The reverse function (|2} cannot 
be reduced to the forward definition ([T) because the transi- 
tion probabilities are not normalized in both directions; they 
are only normalized in the forward direction. In particular, 
(Q]i is an expectation, whereas d2J cannot be an expectation in 
general. Despite this key technical difference, it is interesting 
(and ultimately very useful) that the authority function also 
satisfies a reverse form of Bellman equation. 

Theorem 1 (Reverse Bellman Equation) The authority 
function satisfies the reverse Bellman equation for all s: 



ir( S ) = 7 ^P p VT(p) + r( S ). 



(3) 



Proof: First observe that the fc-step rewards can be expressed 
in terms of the (k — l)-step rewards; that is 



.(2) 



( s ) = E f pV (1) W' etc - 



r^(s) = J2 P P,s 
pes 

Therefore, from the definition of in ©, one obtains 
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K*(s) = r (s) + 7 [r«(s) + jr^(s) + . . .] 

= r(s) + 7 [ E P pAp) + 7 E P P>* 

pes 

r{p) + jr^(p) + 



,(i) 



(P) 
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'•( S )+7E F ^ 
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r(s) +1 J2 P P^(p)- 

pes 



□ 



The standard Bellman equation (fTJ looks forward from a 
state to define its value, but equation (O looks backward from 
a state in define its authority. Therefore, we call equation (O 
the reverse Bellman equation (RBE for short). Similar to Page 
Rank, K* determines the authority of a page based on its back 
links. However, i? T is well defined without teleportation. In 
particular, the surfer model i 5 " is defined on the link structure 
only, without any teleportation. Notice that P n is not neces- 
sarily irreducible or aperiodic, in fact it is not even stochastic 
on rows for dangling pages. Yet, perhaps surprisingly, one is 
still able to achieve a well defined authority score K*, which 
is not possible from the classical Markov chain theory. 

Theorem 2 For 7 £ [0, 1), any policy tt and any bounded 
reward function r, K* is finite. 



Algorithm 2 Reverse Bellman iteration for computing K*: 
no special treatment is required for dangling pages. 
Initialize Rq 
repeat 

R k+1 = 7 (P") T i?fe + r 
until desired accuracy is reached 



Proof: It can be shown that the spectral radius of 7P*" is 
strictly smaller than that of any well defined policy. Therefore 
I - 7(P 7r ) T is invertible. Additionally, (/ - 7 (P 7r ) T )- 1 = 
Et^=o(7( p7r ) T ) t b y El Theorem 1.5]. Therefore, R n = 
(I — 7(P 7r ) T )~ 1 r, hence R 17 is finite for any policy and any 
bounded reward function r. □ 

The practical significance of the RBE is that it yields an 
efficient algorithm for computing R^, based on a backward 
version of value iteration as used in dynamic programming 
and reinforcement learning; see Algorithm |2] To establish 
the correctness of this algorithm we first need a lemma. Let 
|| ■ || be the L-2 norm, \\R t \\ = (£L^i« 2 ) 1/2 - 

Lemma 1 For any R eR N , we have ||(F 7r ) T i?|| < \\R\\. 
Proof: N N N N 

11 (p*) r «ii a = E(E p ^ h )) ^ E E p Zi R ( h ? 

i=X h=l t=l h=l 

N N N 

= E E p hM h ) 2 = E R ^) 2 = w r w 2 - d 

h=l i=l h=l 

Note here we used the ordinary L-2 norm rather than the 
weighted L-2 norm, as is common in reinforcement learning. 

Theorem 3 For 7 6 [0, 1) and finite r, Algorithm's update 
has a unique fixed point to which the iteration must converge. 

Proof: The proof follows the Banach fixed-point theorem 
given in 0]. Define T T : R N ->• M. N be a mapping by, 
T 7r (i? 7r ) = 7(P 7r ) T i? 7r + r. T 77 is a contraction mapping in 
the L-2 norm because 

\\T*(R 1 )-T*(R 2 )\\= 7 \\P*(R 1 -R2)\\< 1 \\R 1 -R 2 \\, 

according to LemmaQ] It follows that the iteration converges 
to the unique fixed point R* = T n (R n ). □ 

This approach to computing an authority ranking has sev- 
eral advantages over Page Rank. First, Algorithm |2] does 
not compute an additional u factor (which requires 2N ad- 
ditional flops per iteration). Second, no special treatment is 
required for dangling pages, which has generally been con- 
sidered tricky for Page Rank 11131 loll. Finally, there is a sig- 
nificant improvement in computation cost and sensitivity for 
Algorithm|2]over AlgorithmQ] 

5 Sensitivity 

To assess the relative sensitivities of Page Rank and reinforce- 
ment ranking to changes in the graph topology, we establish 
a few useful facts. First, an important feature of the rein- 
forcement based authority function is that it decomposes over 
disjoint subgraphs. 




subgraph 1 



Figure 1: A small graph example. 

Proposition 1 (Disjoint Independence) For a graph con- 
sisting of separate subgraphs, the vector is given by the 
union of the local R™ vectors over the disjoint subgraphs. 
(Straightforward consequence of the definition.) 

As the Web grows, subgraphs are often added that have 
only limited connection to the remainder of the web. In such 
cases, the R* score remains largely unchanged, whereas Page 
Rank is globally affected due to teleportation. In fact, merely 
increasing graph size affects the Page Rank scores for a fixed 
subgraph, since the teleportation vector changes. 

Another independence property of reinforcement ranking 
is that the R v score for altruistic subgraphs (subgraphs with 
only outgoing and no incoming links) is not affected by any 
external changes to the graph that do not impact altruism. 

Proposition 2 (Altruistic Independence) The local R* vec- 
tor for an altruistic subgraph cannot be affected by external 
graph changes, provided no new incoming links to the sub- 
graph are created. (Immediate consequence of the definition.) 

Again, Page Rank cannot satisfy altruism independence 
due to the global effect introduced by teleportation. 

Intuitively, separate websites (i.e. separate subgraphs) 
grow in a nearly independent manner. Reinforcement rank- 
ing is more stable with respect to independent subgraph 
changes, since the stationary distribution of Page Rank must 
react globally to even local changes. To illustrate the point, 
consider the example in Figure [TJ First, suppose the link 
from 5 to 6 is not present; in which case the graph consists 
of two disjoint subgraphs. For reinforcement ranking, any 
local changes within the subgraphs (including adding new 
pages) cannot affect the authority scores in the other sub- 
graph, provided no connecting links are introduced between 
them. However, the stationary distribution for Page Rank 
must be affected even by disjoint updates. Next, consider 
the effect of adding a link from 5 to 6, which connects the 
two subgraphs. In this case, changes to the right subgraph 
will still not affect the reinforcement scores of the left sub- 
graph if no new links are introduced from the right to the left, 
whereas Page Rank is affected. Finally, deleting the link from 
node 1 to node 4 has no influence on node 2 under reinforce- 
ment ranking (only the successors of node 4 are influenced), 
whereas the Page Rank of node 2 will generally change. 
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Figure 2: Convergence rate of Page Rank. 



Figure 3: Convergence rate of reinforcement ranking. 



The implication is that the reinforcement based authority 
score is more stable to innocuous changes to the Web graph 
than Page Rank, which has consequences for both the effi- 
ciency of the update algorithms as well as the quality of their 
respective authority scores, as we now demonstrate. 

6 Experimental Results 

We conducted experiments on real world graphs (Wikipedia 
and DBLP) to evaluate two aspects of reinforcement ranking 
and Page Rank. First, we compared these methods on the 
updating problem: how quickly can the score function be up- 
dated given changes to the underlying graph? Second, we in- 
vestigated the overall quality of the score functions produced. 
Sensitivity and the Updating Problem. Intuitively, the 
speed with which an iterative method can update its scores for 
a modified graph is related to the sensitivity of its score func- 
tion. If the score is not significantly affected by the graph up- 
date, then initializing the procedure from the previous scores 
reduces the number of iterations needed to converge. Con- 
versely, if the new score is significantly different than its pre- 
decessor, one expects that many more iterations will be re- 
quired to converge. Indeed, we find that this is the case: Page 
Rank demonstrates far more score sensitivity to graph modi- 
fication, and consequently it is significantly outperformed by 
reinforcement ranking in the updating problem. 

To investigate this issue, we ran experiments on a set of 
real world graphs extracted from Wikipedia dumps taken at 
different times. In particular, we used graphs extracted from 
dumps on Oct-2008, Nov-2009, Oct-2010 and Jan-2011. 
These are large and densely connected graphs; for exam- 
ple, the Jan-2011 graph contains 6,832,616 articles and 
144, 231, 2971inks. For both methods, we used a unform ran- 
dom surfer policy, and a discount/dampening factor of 0.85. 
For Page Rank, we set the teleportation vector to uniform, 
and for reinforcement ranking we used uniform rewards. To 
evaluate a given method's ability to cope with graph updates, 
we measured its rate of convergence to the new solution, 
as well as the relative advantage of initializing from the 
previous solution verus initializing uniformly. In particular, 



the plots show the results for the (initialization, target) pairs: 



color 


initializer 


target 


Anodes% 


Alinks% 


red 


Oct-2010 


Jan-2011 


+ 12,-4 


+ 17,-8 


green 


Nov-2009 


Oct-2010 


+ 18,-5 


+39, -20 


blue 


Nov-2009 


Jan-2011 


+19, -5 


+46, -24 


magenta 


Oct-2008 


Jan-2011 


+49, -18 


+65, -41 



Here, + indicates the percentage of new nodes/links added, 
and - denotes the percentage nodes/links deleted between the 
intial and target graphs. Figures [2]and[3]compare the relative 
rate of convergence of Page Rank versus reinforcement 
ranking. Note that, given its sensitivity, Page Rank is not 
able to exploit a previous solution to significantly improve 
the time taken to converge to a new solution for an updated 
graph: uniform initialization performs as well. This confirms 
Google's report that historical update based power iteration 
does not improve the accuracy for Page Rank [25]. By 
contrast, reinforcement ranking exhibits far less sensitivity 
and therefore demonstrates significantly faster convergence 
when initialized from a previous graph's score function. 
Practically this means that, initialized with a historical 
update from three months prior, the reinforcement score can 
be computed about 10 times more accurately than with a 
uniform initialization. 

Ranking Quality. To assess the ranking quality of the two 
methods we performed an experiment on the DBLP graph 
0(1, which consists of 1,572,278 nodes and 2,083,947 
links. We chose this network because citation links are usu- 
ally reliable, reducing the effects of spam and low quality 
links. For this experiment, we used the same parameters as 
before, except that for reinforcement ranking we used a his- 
tory depth of 3. 

To illustrate the ranking quality achieved by Page Rank and 
reinforcement ranking, we show the highest ranked papers 
according to each method in Tables Q] and ^respectively. We 
used the latest number of citation data retrieved from Google 
Scholar on March 24, 2013 as the ground truth for paper 
quality. Note that this oracle considers future citations that 
are received four years later than the time of the link graph 
was extracted. In addition, Google Scholar considers much 





Table 1 : Top papers according to Page Rank. 




Table 2: Top papers according to i?3 (3-step history). 


Rank 


Paper Title 


#Cites 




Rank 


Paper Title 


#Cites 


1 


A Unified Approach to Functional Dependencies and Relations 


51 




1 


C4 


20913 


2 


On the Semantics of the Relational Data Model 


167 




2 


Introduction to Algorithms 


30715 


3 


Database Abstractions: Aggregation and Generalization 


1518 




3 


Introduction to Modern Information Retrieval 


9056 


4 


Smalltalk-80: The Language and Its Implementation 


5496 




4 


Smalltalk-80: The Language and Its Implementation 


5496 


5 


A Characterization of Ten Hidden-Surface Algorithms 


847 




5 


Compilers: Princiles, Techniques, and Tools 


11598 


6 


An algorithm for hidden line elimination 


73 




6 


Graph-Based Algorithms for Boolean Function Manipulation 


8252 


7 


Introduction to Modem Information Retrieval 


9056 




7 


Computational Geometry - An Introduction 


8558 


8 


C4 


20913 




8 


Congestion avoidance and control 


6078 


9 


Introduction to Algorithms 


30715 




9 


Time, Clocks, and Ordering of Events in Distributed Sys... 


7720 


10 


Compilers: Princiles, Techniques, and Tools 


11598 




10 


Induction of Decision Trees 


11561 


11 


Congestion avoidance and control 


6078 




11 


Mining Association Rules between Sets ... 


12342 


12 


A Stochastic Paris Program and Noun Phrase Parser for ... 


1314 




12 


A Performance Comparison of Multi-Hop Wireless ... 


4936 


13 


Illumination for Computer Generated Pictures 


2504 




13 


Fast Algorithms for Mining Association Rules ... 


13827 


14 


Graph-Based Algorithms for Boolean Function Manipulation 


8252 




14 


Highly Dynamic Destination-Sequenced ... Routing ... 


6731 


15 


Programming semantics for multiprogrammed computations 


777 




15 


A Stochastic Parts Program and Noun Phrase ... 


1314 


16 


Time, Clocks, and the Ordering of Events in a Distributed ... 


7720 




16 


Support- Vector Networks 


10523 


17 


Reentrant Polygon Clipping 


373 




17 


A Machine-Oriented Logic Based on Resolution Principle 


4077 


18 


Computational Geometry - An Introduction 


8558 




18 


A Theory for Multiresolution Signal Decomposition... 


15897 


19 


A Computing Procedure for Quantification Theory 


2579 




19 


An information-maximization approach to blind separation ... 


5871 


20 


A Machine-Oriented Logic Based on the Resolution Principle 


4077 




20 


The Anatomy of a Large-Scale Hypertextual Web Search ... 


10122 


21 


Beyond the Chalkboard: Computer Support for Collaboration 


1079 




21 


The Complexity of Theorem-Proving Procedures 


4876 


22 


A Stochastic Approach to Parsing 


42 




22 


Combinatorial Optimization: Algorithms and Complexity 


7050 


23 


Report on the algorithmic language ALGOL 60 


646 




23 


A Computing Procedure for Quantification Theory 


2579 



more citation sources than DBLP. Although the results ex- 
hibit some noise, it is clear that the Page Rank scores in Table 
[T]are generally inferior: observe the prevalence of "outlier" 
papers (italicized) that have very few citations. By contrast, 
the reinforcement based ranking in Table Incompletely avoids 
papers with low citation counts. Due to the relative purity of 
the links in this graph, it is reasonable to expect a shallow his- 
tory depth of 3 should be sufficient to safely identify influen- 
tial papers in the reinforcement approach. On the other hand, 
Page Rank which considers long term random walks, appears 
to be derailed by noise in the graph and produces more erratic 
results. 

7 Discussion 

A key challenge faced by Page Rank is coping with dangling 
pages. Although some dangling pages genuinely do not have 
any outlinks, many are left "dangling" simply because crawls 
are incomplete. In practice, the number of dangling pages 
can even dominate the number of non-dangling pages 01311 . 
Page et al. (1998) first removed dangling pages (and the links 
to them) before computing the Page Rank for the remain- 
ing graph, re-introducing dangling pages afterward. Such a 
process, however, does not compute the Page Rank on the 
original graph. Moreover, removing dangling pages produces 
more dangling pages. In general, many approaches have been 
proposed to solve this problem, but it does not appear to be 
definitively settled for Page Rank; see, e.g., lfl3l mil. This is 
not a challenge for reinforcement ranking. 

Recently, versions of Page Rank have been formulated us- 
ing linear system theory (e.g., see lfl5l r6ll25ll). However, the 
justification for these formulations inevitably returns to ran- 
dom walks, teleportation, and the resulting stationary distri- 
butions. As we have observed, such foundations tend to lead 
to globally sensitive ranking methods. Our work explains 
and justifies a linear system formulation in a different way. 
We generalize the teleportation vector to rewards that evalu- 
ate the intrinsic importance of individual pages. Moreover, 



we have related the linear systems formulation to work in dy- 
namic programming and reinforcement learning, via an ac- 
cumulative rewards-based score function. It has previously 
been observed that using a c near 1 in this linear formulation 
still "often" converges, but the reason has not been well un- 
derstood dHHH]. However, we have shown that the authority 
function can be well defined and guaranteed to converge for 
any discount factor in (0, 1) and any well-defined surfing pol- 
icy, without using teleportation. 

There have been many attempts to formulate teleporta- 
tion for more sophisticated ranking, such as personalization 
1271 12011 . query-dependent Il30ll29[ ]. context-sensitive IU8l[l9ll . 
and battling-link-spam ranking IU7I1 . For example, the per- 
sonalized Page Rank surfer teleports to the bookmarks of a 
user. However, these practice still rely on the the stationary 
distribution formulation for convergence. In fact, all these can 
be even more naturally expressed in a reinforcement rank- 
ing framework, and thus convergence guaranteed. For ex- 
ample, the preferences of different users can be modeled by 
different reward functions over pages, influenced by book- 
marks. (Such reward functions can even be learned via in- 
verse reinforcement learning, allowing convenient general- 
ization across a large portion of the graph.) We can also ex- 
plain why the pages linked by the bookmarked pages also 
receive a high ranking, a fact first observed by I127ll . In par- 
ticular, the nonzero rewards received by a user on their book- 
marked pages are also the historical rewards of the successor 
pages of the bookmarked pages, hence the successor pages 
are also rewarded. 

8 Conclusion 

Formuating and viewing Page Rank as the stationary distribu- 
tion of random walks has been long recognized and practiced. 
However, to gurantee the existence, stochasticity, ergodicity, 
and irreducibility of the underlying Markov chain has to be 
ensured. This is tricky for the case of Web, where there are 
many dangling pages, sinks, and pages without any incoming 
links. These problems are important to the theory and prac- 



tice of Page Rank, for which there are many solutions and 
discussions. 

We proposed an authority function based on historical re- 
wards. We used rewards to capture the intrinsic importance 
of pages, without the need of teleportating and constructing 
well behaved Markov chains. We related the authority func- 
tion to the value function in dynamic programming and re- 
inforcement learning, and showed that the authority function 
satisfies a reverse Bellman equation. Thus, at a high level, 
our work establishes a theoretical foundation for the recent 
linear system formulation of Page Rank. We proved that our 
authority function is well defined for any discount factor in 
(0,1) and any surfing policy, by referring not to the station- 
ary distribution theory but to the contration mapping tech- 
nique. Given that random walk models, a generalization of 
Page Rank, have been used in various contexts, we believe 
our work will contribute to the fields of information retrieval 
and social networks. 
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